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Abstract 

We propose a novel algorithm that outputs the final standings of a soccer league, based on a simple 
dynamics that mimics a soccer tournament. In our model, a team is created with a defined poten- 
tial(ability) which is updated during the tournament according to the results of previous games. 
! The updated potential modifies a teams' future winning/losing probabilities. We show that this 
evolutionary game is able to reproduce the statistical properties of final standings of actual editions 
of the Brazilian tournament {Brasileirao). However, other leagues such as the Italian (Calcio) and 
the Spanish (La Liga) tournaments have notoriously non-Gaussian traces and cannot be straight- 
forwardly reproduced by this evolutionary non-Markovian model. A complete understanding of 
these phenomena deserves much more attention, but we suggest a simple explanation based on 
data collected in Brazil: Here several teams were crowned champion in previous editions corrob- 
orating that the champion typically emerges from random fluctuations that partly preserves the 
gaussian traces during the tournament. On the other hand, in the Italian and Spanish leagues only 
a few teams in recent history have won their league tournaments. These leagues are based on more 
robust and hierarchical structures established even before the beginning of the tournament. For 
the sake of completeness, we also elaborate a totally Gaussian model (which equalizes the win- 
ning, drawing, and losing probabilities) and we show that the scores of the Brazilian tournament 
"Brasileirao" cannot be reproduced. This shows that the evolutionary aspects are not superfluous 
in our modeling and have an important role, which must be considered in other alternative models. 
Finally, we analyse the distortions of our model in situations where a large number of teams is 
considered, showing the existence of a transition from a single to a double peaked histogram of 
the final classification scores. An interesting scaling is presented for different sized tournaments. 



Preprint submitted to Elsevier 



July 10, 2012 



1. Introduction 



Soccer is an extremely popular and profitable, multi-billion dollar business around the world. 
Recently, several aspects regarding the sport and associated businesses have been the subject of 
investigation by the scientific community, including physicists who have devoted some work and 

time to describe statistics related to soccer. In the literature about soccer models, one can find 
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applications of complex networks yj] and fits with generalized functions [2]; however, they oft- 
times have only one focus: goal distribution (see e.g. [J, 4, 5]). Outside the soccer literature, 
it is important to mention other interesting studies which do not necessarily focus on the scores 
of the games, such as models that investigate properties of patterns emerging from failure/success 
processes in sports. In the case of basketball, it has been suggested [6] that the "hot hand" phe- 
nomenon (the belief that during a particular period a player's performance is significantly better 
than expected on the basis of a player's overall record), a definitively a non-random pattern, can be 
modeled by a sequence of random independent trials. Returning to soccer, some authors [7] have 
devoted attention to the influence of the perceptual-motor bias associated with reading direction in 
foul judgment by referees. 

However, it is interesting to notice that there is a void in the literature: few studies have been 
carried out under the game theoretic approach of considering the outcome of a tournament from 
a simple dynamics among the competing teams. In other words, in looking at the statistics that 
emerge from this complex system called soccer, one can ask if the properties of the distribution 
of final tournament classification points can be seen as an emerging property of a soccer tourna- 
ment dynamics established by simple rules among the different competing teams, or how these 
classification point distributions emerge from a soccer tournament by considering all "combats" 
among the teams. Here, we propose a model that combines previous studies concerning goal dis- 
tribution [5] and a game theoretic approach to football tournaments that produces realistic final 
tournament scores and standings. 

In this paper, we explore the statistics of standing points in the end of tournaments disputed 
according to the "Double Round Robin System" (DRRsJ*] in which the team with the most tour- 
nament points at the end of the season is crowned the champion, since many soccer tournament 
tables around the world are based on this well-known system. In general, 20 teams take part in the 



1 http://en.wikipedia.org/wiki/Round-robin_tournament 
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first tier tournament, such as "Serie A" in Italy, the English "Premier League", the Spanish "La 
Liga" and the Brazilian "Brasileirao" (from 2003 onwards) soccer tournaments. During the course 
of a season, each team plays every other team twice: the "home" and "away" games. Moreover 
the points awarded in each match follows the 3-1-0 points system: teams receive three points for a 
win and one point for a draw; no points are awarded for a loss. The Serie A Italian soccer tourna- 
ment, or simply the "Calcio", has been played since 1898, but only from 1929 was it disputed in 
its current format and system. Their main champions have been Juventus, winner of the league 27 
times, and Milan and Internazionale which won the league 18 times each. The Spanish "La Liga" 
also started in 1929, and over its history, the tournament has been widely dominated by only two 
teams: Real Madrid and Barcelona. 

In Brazil, the national tournament, popularly known as "Brasileirao", was first organized in a 
modern format in 1971. In 2010 the Brazilian Soccer Confederation (CBF) recognized as national 
champions the winners of smaller national tournaments such as the "Taca Brasil" (played from 
1959 to 1968) and another tournament known as "Roberto Gomes Pedrosa" (played from 1967 
to 1970). However, only in 2003 the Brazilian League started being disputed via the DRRS. In 
all past editions of the tournament the league table was based on the method of preliminaries, 
typically used in Tennis tournaments, which will not be considered in this paper. In the 10 editions 
played under the DRRS, the brazilian tournament has already been won by 6 different football 
clubs: Cruzeiro, Santos, Sao Paulo, Corinthians, Flamengo, and Fluminense. 

The statistics as well as the fluctuations associated to the standings and scores of teams in 
tournaments with 20 teams playing under the DRRS can be very interesting. Moreover, if we 
are able to reproduce such statistics via a simple automaton considering the teams as "agents" 
which evolve according to definite "rules" based on their previous performances and conditions, 
one could use this information when preparing or building up a team before a competition. Thus, 
models (e.g. automata) of games in a tournament, whose results are defined by the evolving 
characteristics of the teams, could provide important knowledge. Therefore, by exploring the 
conditions under which the standing and scores of tournaments can be mimicked by a model, we 
propose a simple, but very illustrative, evolutionary non-Markovian process. 

It is known that many events can alter the performance of teams during a season besides their 
initial strengths, such as the hiring of a new player, renewed motivation due to a change in coach, 
key player injuries, trading of players, among others. For the sake of simplicity, we consider that 
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the teams in the model initially have the same chance of winning the games and that the combi- 
nation of events that can lead to an improvement of a team will be modeled solely by increasing 
the probability of a team winning future games after a victory. Similarly, a loss should negatively 
affect their future winning probabilities. 

Our main goal is to verify if the Brazilian Soccer tournament has final standing scores with the 
same statistical properties that emerge from our simple model, and to check whether the properties 
of the Brazilian tournament differ from other leagues and, if so, the reasons for that behavior. In 
the first part of the paper we calibrate our model by using constant draw probabilities introduced ad 
hoc, based on data from real tournaments. In the second part, we have used draw probabilities that 
emerge from the model dynamics, being dependent on the teams "abilities". Both situations are 
able to reproduce real tournament data. The advantage of the second approach is the independence 
of extra parameters, i.e., the first one uses pre-calculated rates from previous statistics. In addition, 
we analyze distortions of our model under hypotheses of inflated tournaments. Finally, we show 
a transition from single to double peaked histograms of final standing scores, which occurs when 
we analyze a small league and large tournaments. However, it is possible to obtain a scaling for 
different tournaments with different sizes. 

2. A first Model: ad-hoc draw probabilities 

In our model, each team starts with a potential <p ; (0) = (po, where i = 1, ..,n indexes the teams. 
Each team plays once with the other n — 1 teams in each half of the tournament; a team A plays 
with B in the first half of the tournament and B plays with A in the second, i.e. the same game 
occurs twice in the tournament and there is no distinction between home and away matches (the 
"home court advantage" could be inserted in the potential of the teams). In a game between team 
i and team j, the probability that i beats j is given by 

The number of games in the tournament is N = n(n — 1) and in each half of the tournament, 
n — 1 rounds of n/2 games are played. In each round, a matching is performed over the teams 
by a simple algorithm, that considers all circular permutations to generate the games. We give an 
illustration for n = 6 teams, starting with the configuration: 
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1 2 3 

4 5 6. 

This configuration implies that in the first round, team 1 plays team 4, 2 plays 5 and team 3 
plays team 6. To generate the second round, we keep team 1 fixed in its position and we rotate the 
other teams clockwise: 

1 4 2 

5 6 3 

Now, team 1 plays team 5, team 4 plays 6 and team 2 plays 3. After n — 1 = 5 rounds, the 
system arrives at the last distinct configuration and all teams have confronted every other only 
once. We repeat the same process to simulate the second half of the tournament. 

In our model, the outcome of each match is a draw with probability r^ raw and one team will 
beat the other with probability ( 1 — r^ raw ) ; the winning team is decided by the probabilities defined 
by Eq.CQ)- After each match, we increase (pi by one unit if team i wins, decrease (pi by one unit if 
team i loses and (p, > 1, and leave it unchanged in the case of a draw. Here, we used rj raw = 0.26 
that is the average draw probability in actual tournaments around the world. Actually, we observe 
that rdraw ranges from 0.24 (Spanish La Liga) to 0.28 (Italian Calcio); see tabled] 

Besides this, the team is awarded points according to the 3-1-0 scheme. In each new match, the 
updated potentials are considered and the second half of the tournament begins with the conditions 
acquired by the teams in the first half. The team evolution dynamics is briefly described by the 
following algorithm: 

Main Algorithm 

1 If {rand{§, 1] < r draw ) then 

2 Pi = Pi + l and pj = pj + 1; 

3 else 

4 if(ranJ[0,l]< I ^ y )then 

5 pi = pi + 3; <p ; = <p, + l; (pj 

6 else 

7 pj=pj + 3; (pi = (pi-\; (pj 

8 endif 

9 Endif 

Here, it is important to notice that the algorithm works under the constraint (pj > 1, for every j. It 
is important to mention that the arbitrary choice of increments equal to one unit is irrelevant, since 
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= 9j-U 
= <Pj + U 



it is possible to alter the relative change in potential by assigning it different starting values. For 
example, if team A is matched against team B in a certain round, we can denote by Na and Nb the 
difference in number of wins and losses up to that round (Na and Nb can be negative or postive 
integers) for each team. We can then write (p A = <po +Na and (ps = (po + Ng, so our model works 
with unitary increments/decrements, i.e., A(p = 1. As can be observed, for arbitrary A(p we have 
invariance of probability: 

(Pa 



Pr(A >- B) 



((Pa + 9b) 

(p A(p+N A A(p 
(p A(p+N A A(p + (poAcp+NBAcp 

y +N A A(p y A 



(2) 



<J>0 + N A A(p + (p + N B Acp (p A + (p B 

where (p A = <po +N A Acp and <p B = <Po +N B A(p. This simple calculation shows that we can start from 
an arbitrary potential <po for the players and have exactly the same results if we perform increments 
according to A<p. In this case our main algorithm must be changed to increment/decrement by A<p 
instead of 1 and it is dependent on one parameter only, i. e., <po/A<p. 

3. Results Part I: Exploring the first model - Calibrating parameters 

Before comparing our model with real data from tournaments played under the DRRS, it is 
interesting to study some of its statistical properties. Given n teams, one run of the algorithm 
will generate a final classification score for each team. For instance, starting with n = 20 teams 
with the same potential <po = 30, a possible final classification score generated by our algorithm in 
increasing order is [23, 28, 39, 41, 44, 45, 47, 48, 49, 53, 54, 57, 60, 61, 62, 62, 64, 64, 65, 72]. To 
obtain significant information from the model, it is necessary to average these data over different 
random number sequences. To that end, we compute histograms of final score distributions for 
n rim = 100 different final scores, for a varying number of teams. 

In Fig. CD (a), we display the relative frequency of scores as a function of the rescaled score, 
considering all teams initially with <po = 2, for varying tournament sizes n. Under this regime 
of low <po, the changes in potential according to the algorithm generate large fluctuations in the 
winning/losing probabilities and a double peak pattern is observed in the histograms. 

For a study of scaling size, we consider our histogram as a function of the variable since 
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Figure 1: Figure (a): The score histogram rescaled by the number of teams in the large fluctuation regime 
tournament for <po = 2. The histogram was generated from an average over n run — 100 different tournaments simu- 
lated by our automaton. We simulated tournaments with n = 20,40, 80, 160, 320, and 640 different teams. Figure (b): 
The accumulated frequency of classification scores: number of teams with score smaller than a determined score 
divided by the number of teams. 



the larger the tournament the larger are the team scores (number of points). This double peak 
shows that our dynamics leads to two distinct groups: one that disputes the leadership and the 
other that fights against relegation to lower tiers. In Fig. Q](b), we plot the cumulative frequency 
as a function of (that essentially counts how many teams have scores smaller than, or equal 
to a given score). We can observe an interesting behavior due to the presence of extra inflection 
points that makes the concavity change sign and the non-gaussian behavior of the scores, indepen- 
dent of the size of the tournaments. Although clearly non-gaussian, because of the double peak 
and the "S" shaped cumulative frequency, the Kolmogorov-Smirnov (KS) and Shapiro-Wilk(SW) 
tests (references and routine codes of these tests are found in [80) were performed to quantify the 
departure from gaussianity. An important point for methods applied is KS |^] can be applied to 
test other distributions differently from SW which is used for normality tests specifically. 

By repeating the experiment for <po = 30 a transition from a single peak to a double peak can be 
observed from n ps 40 which is observed in Fig. [21(a). Under this condition, wins and losses cause 
small changes in the winning/losing probabilities simulating a tournament under the "adiabatic" 
regime. 

We observe that this interesting behavior is reflected in the curves of cumulative frequencies 
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Figure 2: Figure (a): The histogram of scores rescaled by the number of teams: in the small fluctuation regime 
tournament for <po = 30 , under the same conditions of FigureQ] Figure (b): The accumulated frequency for <po = 30, 
for the same conditions of Figure Q] 

that change from single to double "S" shaped in Fig. |2] (b). It is interesting to verify whether 
this tournament model is able to mimic the score statistics of real tournaments and, if so, under 
what conditions? To answer this question, we need to explore real tournaments statistics. In 
Table [Tj we show the compiled data of the last 6 editions of important soccer tournaments around 
the world: Italian, Spanish, and Brazilian. We collect data about scores of the champion teams 
(maximum) and last placed teams. We average these statistics for all studied editions and we 
analyze the Gaussian behavior of score data for each edition separately (20 scores) and grouped 
(120 scores) by using two methods: Shapiro-Wilk and Kolmogorov-Smirnov using a significance 



£ « 0.26 



level of 5%. The draw average per team was also computed, which shows that r^ raw — ^ 
which corroborates the input used in our previous algorithm. 

Some observations about this table are useful. The traditional European tournaments, based 
on the DRRS have non-Gaussian traces as opposed to the Brazilian league, an embryonary tourna- 
ment played under this system. This fact deserves some analysis: in Brazil, over the last 6 editions, 
(compiled data are presented in Tabled]) 4 different football clubs have won the league. If we con- 
sider all 10 disputed editions, we have 6 different champions which shows the great diversity of 
this competition. The Brazilian League seems to be at a greater random level when compared 
to the European tournaments. A similarity among teams suggests that favorites are not always 
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Table 1 : Compiled data of important tournaments around the world: Italian, Spanish, and Brazilian Leagues 





2006 


2007 


2008 


2009 


2010 


2011 


all 


Italian* (Calcio) 


minimum 


35 


26 


30 


30 


29 


24 


29(2) 


maximum 


86 


97 


85 


84 


82 


82 


86(2) 


Kolmogorov-Smirnov 


no 


yes 


yes 


yes 


yes 


yes 


no 


Shapiro-Wilk 


no 


no 


yes 


yes 


yes 


yes 


no 


draws (average per team) 


12.5 


11.4 


11.2 


9.5 


10.2 


9.7 


10.8(5) 


Spanish (La Liga) 


minimum 


24 


28 


26 


33 


34 


30 


29(2) 


maximum 


82 


76 


85 


87 


99 


96 


87(4) 


Kolmogorov-Smirnov 


yes 


yes 


yes 


yes 


yes 


yes 


no 


Shapiro-Wilk 


yes 


yes 


yes 


yes 


no 


no 


no 


draws (average per team) 


10.5 


9.8 


8.7 


8.3 


9.5 


7.9 


9.1(4) 


Brazilian (Brasileirao) 


minimum 


28 


17 


35 


31 


28 


31 


28(2) 


maximum 


78 


77 


75 


67 


71 


71 


73(2) 


Kolmogorov-Smirnov 


yes 


yes 


yes 


yes 


yes 


yes 


yes 


Shapiro-Wilk 


yes 


no 


yes 


yes 


yes 


yes 


yes 


draws (average per team) 


9.7 


9 


9.6 


10.2 


11.8 


10.5 


10.1(4) 



* The 2006 year (which corresponds to season 2005/2006) was replaced by 2004/2005 in Calcio 
since cases of corruption among referees have led to changes in teams scores with points being 
reduced from some teams and assigned to others. Here "yes" denotes positive to normality test 
and "no" denotes the opposite, at a level of significance of 5%. 

crowned champions and many factors and small fluctuations can be decisive in the determination 
of the champion. This may also indicate that the Brazilian tournament has an abundance of ho- 
mogeneous players differently from the Italian tournament, in which the traditional teams are able 
to hire the best players or have well-managed youth teams, or even sign the ones who play for the 
national Italian team. Consider for example Real Madrid and Barcelona in Spain: they govern the 
tournament by signing the best players, even from youth teams from abroad (as is the case with the 
World's Player of the Year, Lionel Messi who joined Barcelona at age 13 from Argentina). It is not 
uncommon for a player who has stood out in the Brazilian or other latin american champioships to 
be hired to play in Europe for the next season, further contributing to the lack of continuity from 
one season to the next and to the "randomization" of the teams. 

In Brazil, there is not a very large financial or economic gap among teams and although fa- 
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vorites are frequently pointed out by sports pundits before the beginning of the tournament, they 
are typically not able to pick the winners beforehand. In fact, many dark horses, not initially 
pointed out as favorites, end up winning the league title. This suggests that, in Brazil, the cham- 
pions emerge from very noisy scenarios, as opposed to other tournaments that only confirm the 
power of a (favorite) team. One could add to that the existence of a half-season long local (or 
state-based only) tournament making the predictions widely reported in the press not very trustful 
or reliable in any sense. Therefore, it is interesting to check our model by studying its statistical 
properties, changing parameters and then comparing the model with real data. 

We perform an analysis of our model considering different initial parameters <po = 2, 10 and 30 
and the different tournaments (see plot (a) in Fig. [3]). We have used r^raw = 0.26 in our simulations. 
It is possible to observe that the model fits the Brazilian soccer very well for <po = 30 and A(p = 1 . 
It is important to mention that extreme values (minimal and maximal) are reproduced with very 
good agreement. For example, plot (b) in Fig. [3] shows that minimal and maximal values obtained 
by our model (full squares and circles respectively, in black) are very similar to the ones obtained 
from the six editions of the Brazilian tournament (open squares and circles, in blue). We also 
plot continuous lines that represent the average values obtained in each case. This shows that our 
model and its fluctuations capture the nuances and emerging statistical properties of the Brazilian 
tournament which, however, seems not to be the case of Calcio and La Liga. Plot (a) of Fig. [3] 
shows that the cumulative frequency of these two tournaments are very similar to one another and 
that no value of the parameter <po (many others were tested) is capable of reproducing their data. 

Now a question that can quickly come to mind to readers of this paper is: are we modeling 
something that is entirely random and non-evolutionary, i.e., could we use a simpler model? The 
answer, fortunately, is "not really". To understand this, let us suppose a completely random and 
non evolutionary model (the probabilities do not change with time), in which a team should win, 
lose, or draw with the same probability: 1/3. 

A comparison of the best fit of our model (evolutionary) with the totally random (static) model, 
under the exact same conditions of 20 teams under the DRRS, is shown in Fig. HI We observe that 
the static model does not reproduce the lower and upper values as well as the shape of cumulative 
frequency of the Brasileirao which, on the other hand, is very well fitted by our model. 
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Figure 3: Figure (a): A comparison between results produced by our model, using different initial parameters 

<po = 2, 10 and 30 and different tournaments. We have used r ( [ raw — 0.26. Results were obtained considering 6 
runs of ourartificial tournament. We can observe that the model fits the Brazilianleague (black continuous curve 
obtained from 6 editions of the Brazilian league) precisely for <f>o = 30. On the other hand, Calcio and La Liga are 
not reproduced by our model indicating clear differences between such tournaments and the Brasileirao. Figure (b): 
This figure shows that minimal and maximal values obtained by our model (full squares and circles respectively, 
depicted in black) are very similarto the ones obtained in the six editions of the Brazilian League (open squares and 
circles, in blue). The continuous line corresponds to the average values obtained in each case for a comparison. 

4. Second model: Draw probabilities emerging from the model itself 

Previous authors (see for example Jst and \^\) claim that in a match between two teams A and 
B, the probability that the result is (n^, rig), where is the number of goals scored by team i, can 
be approximated by a Poisson distribution: 

Pr [{n a ,n b ) | (0 A , fo)] = Pr(n a ,<j) A ) -Pr^,^) 

(3) 

where 0^ an d 0b, the average number of goals in a game, are taken as the abilities of the teams. 

It is very interesting to work with models that have as few free parameters as possible; the 
imposition of an ad-hoc draw probability in our previous version of the model can therefore be 
seen as a shortcoming. It is possible to overcome this problem, and at the same time maintain 
the same model properties, by using Eq.[3]as a means to calculate r^ raw in the previously defined 
algorithm. Given two teams, with potentials (pA and <pg, we can calculate r ( / raw making the direct 
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Figure 4: Comparison of our model (evolutionary) and a totally random (static) model: We can observe that the 
static model does not reproduce the lower and upper values as well as the shape of the Brazilian League which, on the 
other hand, is very well fitted by our model. 

identification of the abilities with our concept of potential, i.e., 0a = <PA and 0g = <pg, so that 

rdraw = Pr[(/Z fl = /Z fc )|(0A,0fl)] 

OO 
OO 

leaving our model with only one free parameter. 

The first important point is that the probability independent of previous ad-hoc information 
obtained from tournaments, arising as a property of the teams themselves, i.e., their ability to 
score goals. A plot of r draw as function of 0a and § B is shown in fig. [5J However it is important 
to mention this definition must be adapted if <p is not exactly the average number of goals of the 
team per match (0), since the number of goals in a match is finite, its extension to infinity can 
have drastic consequences in the draw probabilities. It should be noted that the potential of the 
teams (which may be rescaled) represents the abilities of teams but can be very different from the 
average of the number of goals scored by teams in a given match. In this case, a solution to this 
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Figure 5: The probability r^raw as function of <j>A and 0b. 
problem is to consider a truncated Poisson function 

f™ c M)=Z{<p,m) 



ptruncf„ A\ _ 7/A m \- 1 T g 

n! 
where 



Z(0,m)=£^ 



with m being the appropriate cutoff for modeling and must be suitably adjusted. Therefore, r^ raw 
is now re-written as 

_ Z(fe,m)- 1 » (fafe)" (04+fe) 
^ Z(0 A ,m) to n ' 2 ' 

This is a solution but m must be adjusted according to the initial potential <po if we use A<p = 1 . 
However, it is also possible to solve this problem by suitably scaling the potential to be the average 
number of goals in a match. Therefore, if we start the simulations with the potential representing 
the average number of goals of a real tournament <po = X = <PoA(p, then the increment must be 
given by A(p = X / (po, so that the the win/lose probabilities are kept fixed, according to equation 
[2 In this case, m — > °° presents the best fits and gives the correct draw probabilities r^ rcM , making 
the model again more suitable, since m is not an experimental parameter. 

In the next section, we will present our results based on this new approach for the calculation 
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of r^raw an d we show that real data are also reproduced by both methods presented in this section. 

5. Results Part II: variable draw probabilities 

We perform new simulations considering our previously described algorithm, but allowing for 
variable draw probabilities. As before, we organize the teams via the DRRS, starting with fixed 
potentials and take averages over many runs of the model. Our first analysis was to reproduce the 
final score of the Brazilian tournament tuning different values of m. 



m=10 m = 20 m = 30 




Figure 6: Plots of r^ ravi given by equation|4]for different values of m. 

Figure |6] shows the surfaces corresponding to rd raw calculated by equation HI We can see 
that higher m values result in higher draw probabilities for low potentials. Figure [7] shows the 
cumulative distribution of simulated final scores from our artificial tournament generated by the 
model considering the variable r<i rmv given by equation HI Three different values of m (10, 20 
and 30) were tested. We can show that best value to fit the real data extracted from the same 
6 Brazilian soccer tournaments (continuous curve) is m = 20. All teams started the simulations 
with <po = 0o = 30, calibrated in the previous results developed in Results Part I. 

Since m is not an acessible parameter of tournaments, we can start from (po = 1-57 as the 
initial potential of the teams, which corresponds to the average number of goals scored by a team 
in a match of the Brazilian tournament studied, and ajust A(p = 1.57/30 in our algorithm. In 
this case, an excelent fit (see figure [8]) is obtained, considering the regular Poisson distribution 
(m — >• oo). Naturally the number 30 follows from our initial calibration of our model (when we 
fixed rdraw = 0.26, and <po = 30 led to an excelent fit as we previously observed). 
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Figure 7: Cumulative distribution of simulated tournament generated by the model considering the variable r^raw 
given by equation [4] We show that the best value to fit the real data extracted from the same 6 Brazilian soccer 
tournaments (continuous curve) is m = 20. 

As can be seen in the figures, we can obtain good fits with this new version of model, which is 
a little more complicated than the previous version with constant drawing probabilities, but it uses 
information inherent in the model itself. 

Finally, to test some scaling properties of the model, we reproduce the same plot of figure [2] 
using r^ ravv , according to |4l which is shown in plot (a) in figure |9j This figure shows histograms of 
final scores generated by n run =100 different simulated tournaments using variable probabilities 
r draw We can observe that the transition from one to two peaks is fully due to the imposition that a 
team in our model has a minimal potential <p > ; 1 . This effect can be overcome if we scale <po with 
size system and, it is possible to collapse the curves by re-writing the scores as normal standard 
variables, i.e., 

t \ I \ */ N S ( n ) ~ ( S ( n )) 

score(n) = s{n) -> s*{n) = \ J \ " . 

Thus, if H(s*(n),<po,n) denotes the histogram of normalized scores, we have the scaling relation 
H(s*(ni),<po,rn) = H(s*(bni),b(po,bni), Plot (b) in figure [9] shows this scaling. We take the 
logarithms of the histogram to show the collapse more explicitly. The small inset plot is taken 
without the logarithm. We can see that different tournaments can present the same properties as 
long as the teams' potentials are rescaled. 
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Fi gure 8: Cumulative distribution of the simulated tournament generated by the model, considering r c ( raw calculated 
from the standard Poisson distribuition (m — > °°) (black balls) averaged over 6 repetitions. The continous curve shows 
the 6 editions of the Brasilian soccer tournament. 

Summary and Conclusions 

In this paper, we have explored a new model that reproduces in detail the final classification 
score (standings) of the teams in the Brazilian Soccer tournament. The Brazilian tournament, as 
opposed to other tournaments such as the Italian and Spanish Leagues, has some peculiarities and 
seems to display scores that emerge from a dynamics that preserves its Gaussian traces. This can 
be justified by several reasons: Brazilian tournaments have many distinct champions and the com- 
petition is not dominated by a few teams. Favorite teams frequently perform badly, and there is an 
inexhaustible source of new players, making the tournament more balanced and very susceptible 
to small fluctuations. Our model also seems to be a good laboratory to study fluctuations that may 
happen in large tournaments. More specifically the model presents a transition from a one to a two 
peaked distribution of the final scores (standings) histograms that correspond to disputes near the 
champion's region and another closer to the region of the last placed team. Moreover, we also pre- 
sented results relative to scaling of histograms of final scores and showed that tournaments based 
in our model for different sizes collapse on the same curve when we consider normal standard 
deviations for final scores and a linear scaling for potentials. 

Here, it is important to mention that after the present contribution was completed, we were 
alerted of the existence of a similar model with more parameters proposed to study statistical prop- 



erties of tournaments 
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However, our contributions is very different, because in that study, the 
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Figure 9: Plot(a): Histograms of final scores generated with n run = 100 different simulated tournaments using variable 
r draw The figure is similar to figure [2] since the same <po = 30 was used. Plot (b): Scaling corresponding to plot (a). 

matches are generated under the mean field approximation regime based on a Markovian random 
walk. In such an approximation, therefore, the teams do not evolve in time. 

About data extraction for the validation of the model 

Table[T]shows the data from real tournaments used to compare with the results produced by our 
model, as illustrated in Figs. [3] and HI The data (available publicly at |http://www. wikipedia .org/) 
is based on records from the Italian, Spanish, and Brazilian tournaments during the 2005/2006 - 
2010/201 1 seasons. For the Italian Calcio, the year 2006 (which corresponds to season 2005/2006) 
was replaced by 2004/2005, since cases of corruption and a referee scandal in 2005/2006 have 
supposedly changed the scores of teams, and points were reduced from some teams and awarded to 
others. To obtain the data from our model, we implemented a simple algorithm in the FORTRAN 
language which computes the possible games according to the DRRS system and evolves the 
potential and points of teams producing a final classification score, or even a large number of final 
classification scores. This is used to plot Figures Q] and [2] which explore the details of the model. 
All other figures of the paper represent a comparison of the data extracted from Wikipedia and 
those produced by our model. 

Acknowledgments 

The authors thank CNPq (The Brazilian National Research Council) for its financial support. 

17 



References 



[1] R. Onody, P. Castro, Complex network study of brazilian soccer players, Physical Review E 70 (2004) 037103. 

[2] L. Malacarne, R. Mendes, Regularities in football goal distributions, Physica A 286 (2000) 391. 

[3] E. Bittner, A. Nussbaumer, W. Janke, M. Weigel, Self-affirmation model for football goal distributions, Euro- 
physics Letters 78 (2007) 58002. 

[4] E. Bittner, A. Nussbaumer, W. Janke, M. Weigel, Football fever: goal distributions and non-gaussian statistics, 
The European Physical Journal B 67 (2009) 459. 

[5] G. Skinner, G. Freeman, Are soccer matches badly designed experiments?, Journal of Applied Statistics 36 

(2009) 1087. 

[6] G. Yaari, S. Eisenmann, The hot (invisible?) hand: Can time sequence patterns of success/failure in sports be 

modeled as repeated random independent trials?, PLoS ONE 6 (201 1) e24532. 
[7] A. Kranjec, M. Lehet, B. Bromberger, A. Chatterjee, A sinister bias for calling fouls in soccer, PLoS ONE 5 

(2010) el 1667. 

[8] W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical Recipes:The Art of Scientific Computing, Cam- 
bridge University Press, New York, USA, 2007. 

[9] S. Garpman, J. Randrup, Statistical teste for pseudo-random number generators, Computer Physics Communi- 
cations 15 (1978)5. 

[10] H. Ribeiro, R. Mendes, L. Malacarne, S. Piccoli Jr, P. Santoro, Dynamics of tournaments: The soccer case, The 
European Physical Journal B 75 (2010) 327. 



18 



